variational inference framework
Review for NeurIPS paper: Learning to Learn Variational Semantic Memory
Correctness: As mentioned above, I am a bit skeptical about the technical correctness for the variational inference framework. Specifically, - I think the latent z in Eq.(2) does not properly represent the class prototypes as z is conditioned on each individual x, not a entire class set (But on the other hand, Figure 1 shows that the latent z is conditioned on each of the class sets, and I'm confused which one is right). I don't understand how the approximate posterior q(z S) can have dependency on S, because according to the generative process defined by Eq.(2), the true posterior p(z x,y) does not have the dependency on the entire class set S except for each individual point (x,y). If it is not included, then the inference of m should be based on semi-implicit variational inference [2,3] as the intermediate stochastic variable m is only for the approximate posterior. However, such a discussion has not been discussed in the paper and the ELBO expression Eq.(13) seems not to represent the SIVI procedure as well.
Reviews: VIREL: A Variational Inference Framework for Reinforcement Learning
This paper brings an novel perspective on probabilistic frameworks for new reinforcement learning algorithms, and the adaptive temperature reweighting may lead to more insightful exploration built into our RL algorithms. The paper is written clearly, and is also well-organized and easy to understand, and the appendix is structured clearly as well, although the full length of the appendix paper makes the paper a little unwieldy to read. The authors have clearly put in a lot of work into developing the theory and presentation in this paper, and although empirically the performance of the derived algorithms do not show significant improvement over max-ent RL methods (with twin Q functions as in TD3), the approach is interesting and I believe this paper would be well-suited for NeurIPS. Some specific comments: - In the definition of the residual error on L147, over what distribution is the L p norm being referred to? - Instead of e_w being a global constant, have the authors considered parametrizing e_w as a function of h - this would allow for state-adaptive uncertainty and exploration, and I believe a majority of the results would still hold. However, most works with the Max-Ent framework parametrize variational distributions through only the action distributions, and fix the variational distribution on dynamics to the actual dynamics model.
VIREL: A Variational Inference Framework for Reinforcement Learning
Applying probabilistic models to reinforcement learning (RL) enables the uses of powerful optimisation tools such as variational inference in RL. However, existing inference frameworks and their algorithms pose significant challenges for learning optimal policies, e.g., the lack of mode capturing behaviour in pseudo-likelihood methods, difficulties learning deterministic policies in maximum entropy RL based approaches, and a lack of analysis when function approximators are used. We propose VIREL, a theoretically grounded probabilistic inference framework for RL that utilises a parametrised action-value function to summarise future dynamics of the underlying MDP, generalising existing approaches. VIREL also benefits from a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference, and the ability to optimise value functions and policies in separate, iterative steps. In applying variational expectation-maximisation to VIREL, we thus show that the actor-critic algorithm can be reduced to expectation-maximisation, with policy improvement equivalent to an E-step and policy evaluation to an M-step.
Variational Inference for Quantifying Inter-observer Variability in Segmentation of Anatomical Structures
Liu, Xiaofeng, Xing, Fangxu, Marin, Thibault, Fakhri, Georges El, Woo, Jonghye
Lesions or organ boundaries visible through medical imaging data are often ambiguous, thus resulting in significant variations in multi-reader delineations, i.e., the source of aleatoric uncertainty. In particular, quantifying the inter-observer variability of manual annotations with Magnetic Resonance (MR) Imaging data plays a crucial role in establishing a reference standard for various diagnosis and treatment tasks. Most segmentation methods, however, simply model a mapping from an image to its single segmentation map and do not take the disagreement of annotators into consideration. In order to account for inter-observer variability, without sacrificing accuracy, we propose a novel variational inference framework to model the distribution of plausible segmentation maps, given a specific MR image, which explicitly represents the multi-reader variability. Specifically, we resort to a latent vector to encode the multi-reader variability and counteract the inherent information loss in the imaging data. Then, we apply a variational autoencoder network and optimize its evidence lower bound (ELBO) to efficiently approximate the distribution of the segmentation map, given an MR image. Experimental results, carried out with the QUBIQ brain growth MRI segmentation datasets with seven annotators, demonstrate the effectiveness of our approach.
VIREL: A Variational Inference Framework for Reinforcement Learning
Fellows, Matthew, Mahajan, Anuj, Rudner, Tim G. J., Whiteson, Shimon
Applying probabilistic models to reinforcement learning (RL) enables the uses of powerful optimisation tools such as variational inference in RL. However, existing inference frameworks and their algorithms pose significant challenges for learning optimal policies, e.g., the lack of mode capturing behaviour in pseudo-likelihood methods, difficulties learning deterministic policies in maximum entropy RL based approaches, and a lack of analysis when function approximators are used. We propose VIREL, a theoretically grounded probabilistic inference framework for RL that utilises a parametrised action-value function to summarise future dynamics of the underlying MDP, generalising existing approaches. VIREL also benefits from a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference, and the ability to optimise value functions and policies in separate, iterative steps. In applying variational expectation-maximisation to VIREL, we thus show that the actor-critic algorithm can be reduced to expectation-maximisation, with policy improvement equivalent to an E-step and policy evaluation to an M-step.